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ABSTRACT 

Precision photometric redshifts will be essential for extracting cosmological parameters from the 
next generation of wide-area imaging surveys. In this paper we introduce a photometric redshift 
algorithm, ArborZ, based on the machine- learning technique of Boosted Decision Trees. We study 
the algorithm using galaxies from the Sloan Digital Sky Survey and from mock catalogs intended to 
simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the 
performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a 
full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single 
"best estimate" and error, and also provides a photo-z quality figure-of-merit for each galaxy that 
can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction 
of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future 
work. 

Subject headings: galaxies: distances and redshifts - galaxies: statistics - large-scale structure of the 
Universe: methods: statistical - methods: data analysis 



1. INTRODUCTION 

Cosmic expansion makes the redshift of a distant ob- 
ject one of its most fundamental observables. The red- 
shift allows us to estimate distances, and hence to place 
observed properties (e.g. fluxes) on a physical scale (e.g. 
luminosities). Whether interpreted as recession velocity 
or a measure of the change in the scale factor ( |Bunn| 
& Hogg 2009), redshift is defined as the fractional in- 



crease in wavelength of the observed spectral energy dis- 
tribution (SED) z = AA/A. As such, it is measured by 
comparing observed SEDs of distant objects to those of 
objects nearby or to atomic and molecular features iden- 
tified in the lab. 

This comparison is relatively straightforward when the 
two SEDs are both physically similar and well measured, 
with high signal-to-noise and wavelength resolution ade- 
quate to resolve the relevant features. These conditions 
are often met in spectroscopic surveys, and these typi- 
cally allow redshifts to be determined with great preci- 
sion. For example, the ~10 6 gal axy redshifts from the 
Sloan Digital Sky Survey (SDSS; |York et al.|[2000| ) have 
errors of Az < 0.0002. Unfortunately, high resolution 
spectroscopic data are costly to obtain. Spreading the 
light from an object into several thousand independent 
resolution elements typically requires exposures 50-100 
times as long as those for broad-band images with the 
same signal-to-noise. Furthermore, high-resolution spec- 
tra imaged on detectors take up much more space than 
direct images, requiring slit masks, fiber feeds, or image 
slicers. These challenges have limited the scope of red- 
shift surveys, so that the total number of galaxy spectro- 



scopic redshifts so far measured remains of order a few 
million. 

Many modern astrophysical measurements would ben- 
efit from substantially larger catalogs of redshifts, say 10 8 
or 10 9 . These include studies of galaxy evolution, galaxy 
cluster identification, large scale structure and baryon 
acoustic oscillation measurements, identification of very 
high redshift objects, and gravitational lensing studies. 
Many of these studies would be well served by much more 
crudely measured redshifts, say Az ~ 0.01. Such red- 
shifts would, for example, allow a 2% measurement of 
the distance to a galaxy at z = 0.5. Cosmology-induced 
systematic uncertainty in the conversion from redshift to 
distance would then dominate uncertainty in determina- 
tion of the galaxy's properties, making greater accuracy 
of little benefit for this purpose. For many applications, 
for example determination of the weak lensing source 
galaxy distribution, an accurate estimate of the redshift 
probability density functio n is as important as the a ccu- 
racy of the redshift itself ( Mandelbaum et aI7||2008 1 . In 
many cases, these PDFs are highly non-Gaussian, adding 
to the complexity of the problem. 

It has long been recognized that broadband imaging 
in several passba nds provides a crude measurement of 



an object's SED ( |Baum 
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1962). In the era of wide- field 
CCD imaging, precise calibration of broadband photom- 
etry is possible, and the low resolution SEDs measured 
in this way can be used to estimate redshifts. Early ef- 
forts to apply p hotometric redsh ifts to galaxy evolution 
( |Koo|1981[[l985| and cosmology ( |Loh fc Spillar|1986a|b I 
showed promise, but were limited by the need lor pre- 
cisely calibrated photometry and the lack of adequate 
spectroscopic training and test catalogs. Use of photo- 
metric redshifts has exploded in importance with the on- 
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set of massive, well calibrated, multi-band imaging sur- 
veys like the SPSS flOyaizu et al.|2008||Lima et al.|2008[ 
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Cunha et al. 



2009 \ . They have also played an essentia 
role in the st udy~of very deep but sm aller-area s urveys 
like GOODS (IGiavalisco et al.||2004|, COSMOS JScov" 



meet al. 2007 , and the CFHT Legacy Survey ( |CouporT 
et alJ2009|). Future pro jects like the Dark Energy Sur- 



vey ( jATjEott et alT]| 2005 1 and the Large Synoptic Survey 
Telescope ( Ivezic et al.|2008 1 plan to rely heavily on pho- 
tometric redshifts lor central science goals. 

One approach to photometric redshift estimation is 
modeled on the method used for spectroscopic redshift 
measurement — the comparison of the observed SED to a 
set of known theoretical or empirical SED templates (e.g. 



Fernandez-Soto et al.|19 99 ; Bcmtcz 2000 ; Fcldman n et al 
2006} . While this approach can work well, it is compli- 
cated by a need for precise understanding of the relative 
efficiency of the observing system at each wavelength, as 
well as the need for templates spanning the full range of 
wavelength and spectral type of the objects observed. An 
alternative approach is more empirical. These methods 
begin with a training set of objects for which both pho- 
tometry in the system of interest and spectroscopic red- 
shifts have been obtained. Ideally, this training set will 
span the space (in SED, magnitude, and redshift) of the 
full sample for which photometric redshifts are desired. 
This training set is then used to define a transformation 
from points in the multidimensional observed magnitude 
space to points in a redshift (and possibly SED) space. 

Template methods have been favored, and are proba- 
bly necessary, for estimating redshifts of galaxies inac- 
cessible to spectroscopic redshift determination because 
they are too faint. As a result, they have playe d an 



especially important role in the HDF and UDF (Coe 
et al.||2006[ ). For many current and upcoming surveys, 



the problem is not that spectroscopic redshifts are com- 
pletely impossible to obtain, but that there are too many 
objects for which redshifts are required. For these appli- 
cations, photo-z estimation based on training sets can be 
very practical. Approaches have also been developed to 
extend these empirical techniques bey ond the limits of 
available training sets ( |Newman||2008| ) , and to combine 



temp l ate-ba sed and empirical approaches (Ilbert et al. 
2006 20091. Template-fitting methods can easily pro- 
vide formal fit uncertainties. But since the largest er- 
rors occur due to mismatch between the templates and 
the SED being fit, these errors often significantly under- 
estimate the full uncertainty in photo- z determination. 
It is likely that full exploration of photo-z uncertainties 
will require the use of extensive spectroscopic verification 
sets, which must be kept independent of training sets. 

Discovering the mapping between the space of ob- 
served magnitudes and re dshift-SED spa ce is a classic 
machine learning problem (Mitchcll| |1997| ). Many of the 

approaches familiar in that held have been applied here Survey (SDSS; [York et al.||2000) is an optical imag 
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ble with methods which allow a more complex mapping. 
These include a variety of techniques such as local poly- 
nomial fitting and artificial neural networks. Another at- 
traction of these methods is that they can easily utilize 
parameters other than magnitudes, for example galaxy 
shapes or information about environment, in a natural 
way. Unfortunately, machine learning methods are par- 
ticularly unsuited for extrapolating beyond the limits of 
their training sets; they contain no underlying model to 
support this. 

In this work, we introduce a new machine learning 
technique to the photometric redshift problem; we es- 
timate photo-zs using boosted decision trees (BDTs). 
A decision tree in its most basic form examines the at- 
tributes of a set of data objects to answer a single yes- 
or-no classification question. A series of sequential cuts 
is devised to separate the data into one of the two cat- 
egories. The cuts used on each parameter, and the or- 
der in which they are applied, are established using a 
training set. Performance is tested by running the re- 
sulting decision tree on an independent verification set. 
"Boosted" decision trees are developed iteratively. After 
initial training of the tree, data objects which were orig- 
inally misclassified are given increased weight, boosting 
the attention paid to them, and a second tree is gener- 
ated. This process is iterated tens or hundreds of times, 
with all the resulting trees combined into a "forest" to 
provide significantly enhanced classification power. 

For our photo-z determination, we divide the full red- 
shift range into small bins and use a spectroscopic train- 
ing set to build a set of BDT classifiers for each bin. In 
essence, each classifier examines each galaxy and eval- 
uates the probability that its redshift falls within the 
given bin. By examining the distribution of probabilities 
with redshift we reconstruct a photo-z probability den- 
sity function for each galaxy. The mean of this distribu- 
tion provides an excellent "best estimate" of the photo-z, 
and its shape gives quantitative insight into the actual 
photo-z PDF. 

This paper begins with a description of the data sets 
used for training and testing of the BDT algorithm, 
which we call ArborZ. Section [3] provides a detailed de- 
scription of the ArborZ approach. Tests of the algorithm 
on both real SDSS data and simulated data are then 
presented in some detail. In both cases, we compare 
the performance of this method to some other standard 
methods. We conclude with a review of our results and 
a summary of ideas for future photo-z projects. 

2. GALAXY SELECTION 

To train and evaluate the performance of our photo- 
metric redshift algorithm, we use real and simulated data 
from the following sources: 

SDSS spectro scopic catalog: Th e Sloan Digital Sky 



dConnolly et al. 1119951 iCollister & Lahav 
et al.||M4| |Ball et al \ \ l M7\ \'Mh\ ICarli' 



2004; Vanzella 



es et al.| 2008; 

Freeman et al.|2009[ ). Polynomial fitting methods assume 
a smooth transformation between magnitudes and red- 
shift. These methods have the virtue of simplicity. They 
are most effective when the parameter space of observ- 
ables is not too large, the range of SEDs being studied 
is limited, and the available training sets are extensive. 
Much more flexibility, and better performance, is possi- 



ing survey in the ugriz bands covering ~ ir steradi- 
ans of the northern sky. Of the approximately 5 x 10 7 
galaxies detected in t his survey, roughly 10 6 are tar- 



geted for spectroscopic ( [Strauss et al. 2002 Blanton et al. 
|2003|) followup. We use the catalo g from Data Release 



6 (lAdelman-McCarthy et al. 20081. The spectroscopic 



sample consists of a magnitude-limited "main" sample 
with a median redshift of 0.104 and rp etro < 17.77, and 
a subsample of luminous red galaxies (LRGs) that is 
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volume-limited to z ~ 0.38 but extends out to z ss 0.55 
(Eisenstein et al. 20011. We further enhance this sam- 
ple by including spectroscopic measurements from other 
surveys that can be matched to SDSS photometric obser- 
vations. We in clude data from the 2dF-SDSS LRG and 
QSO (2SLAQ) dCannon et al.|2006t an d the DEEP2 Red- 
shift Survey dDavis et al. 
73T 



2TJ07| ). The final sample 
'rom this sample we re- 



[50U3T 

consists of 7^9,000 galaxies, 
serve 200,000 randomly-selected galaxies for testing, and 
use the remaining sample for training. 

SDSS mock spectroscopic catalog: This catalog is de- 
rived from a larger mock catalog designed to model the 
color, magnitude, and spatial distribution of galaxies in 
the SDSS. The procedure for constructing this catalog 
is described in Appendix A. Beginning with this sample, 
we then apply the spectroscopic selection from the real 
survey as described above. 

Dark Energy Sur vey mock catalog: T he Dark En- 
ergy Survey (DES; |Abbott et al.| [2005) is a planned 
5000-square-degree survey of the southern sky using the 
Blanco 4- meter telescope at CTIO. The five-year survey 
will collect image data in five optical passbands, grizY , 
over 520 nights beginning in 2011. Photometric redshifts 
for this survey have previously been studied by |Lin et aT7| 
[2004| and panerji et aL| [2008] The 573-square-degree 
mock catalog consists of ~ 2 x 10 7 galaxies with z < 1.4, 
with photometry intended to replicate the five-year sen- 
sitivity of the DES. The sample, also constructed as de- 
scribed in Appendix A, is magnitude-limited to r < 24. 
We use a sample of ~ 510k randomly-selected galaxies 
for training, and another randomly-selected 200k galax- 
ies for testing, which provide adequate statistics for this 
study. 

The redshift distributions for these samples are shown 
in Figure [I] 




f 20000 
10000 



Fig. 1. — Redshift distribution for the SDSS spectroscopic sample 
(left) and the DES mock catalog (right). The SDSS sample shows 
two peaks, one near z = 0.1 for the magnitude-limited sample, 
and a smaller one near z = 0.35 for the volume-limited sample of 
luminous red galaxies. The DES mock catalog is magnitude-limited 
to r < 24. 



techn iques. Boosted decision trees (BDTs) (Hastie et al. 
2001 1 are one of the most successful such techniques to 



emerge in recent years, and have fo und applications in 
areas as divers e as text recognition ( |Howe et al.||2005| , 
spam filtering ( |Drucker et al.||1999|), and particl e ldenti- 
fication in high-energy physics ( ]Roe et al.||2005[ ). 

To adapt a binary classifier to the problem ol assigning 
a continuous photometric redshift, we divide the spectro- 
scopic training set into a series of redshift bins, Azi. Each 
bin is assigned its own BDT classifier. The N galaxies 
whose redshifts fall into bin i form the "signal" train- 
ing set for the ith classifier. To form the correspond- 
ing background training set, we choose 57V galaxies at 
random from the set of all galaxies whose spectroscopic 
redshifts fall more than 3<t away from the bin in ques- 
tion, where a is the approximate expected resolution of 
the photo-z algorithm in the target sample (a = 0.02 
for SDSS data). This 3er cut provides a clean separation 
between the signal and background training sets, pre- 
venting the algorithm from overtraining itself by trying 
to distinguish objects that are nearly identical to within 
errors. The choice of 5N galaxies for the background 
training set helps enhance the training statistics. Each 
galaxy in the target evaluation set is then examined by 
the ensemble of classifiers, and the resulting distribution 
of probabilities can be used to extract either a single 
best-estimate photo-z or converted into a photo-z prob- 
ability density function. We use the boosted decision tree 
algorithms im plemented in th e Toolkit for Multivariate 
Analysis ( [Hoecker et al.||2007[ ). 



3.1. Description 

The process of training a boosted decision tree classifier 
begins with the construction of a single decision tree. 
First, a root node is formed containing all the objects. 
Each object has a weight Wi that is initially set to unity. 
The root node is then split into a left branch and a right 
branch by placing a cut on the one variable that gives 
the best separation between signal and background. To 
determine the optimal cut to split a node, we define the 
purity in a given branch by 



P : 



where w s and Wb are the weights of the signal and back- 
ground obJectSj^e^pectively. We then define the Gini 
index ( Breiman et al.|[l984 |, 



G = 



K 1 ~P), 



3. THE BOOSTED DECISION TREE PHOTO-Z 
ALGORITHM: ARBORZ 

Consider the general problem of classifying a set of ob- 
jects, characterized by a vector of observable variables x, 
into two different populations: "signal" or "background" . 
When the two populations are relatively disjoint, simple 
cuts may be sufficient to achieve good efficiency with high 
purity. More realistic and complex situations require 
more sophisticated approaches, such as machine-learning 



where n is the number of objects on that branch. The 
split is made by scanning over the range of each variable 
and determining which cut on which variable maximizes 
the increase in the Gini index between the parent node 
and the sum of the Gini indices of the left and right 
branches. This splitting process is repeated until some 
stopping criterion is reached, for example a minimum 
number of objects on each leaf. A terminal leaf with a 
purity above some given threshold is called signal leaf; 
otherwise it is a background leaf. By construction each 
object falls on either a signal or a background leaf. 
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Individual decision trees are relatively weak classifiers. 
Furthermore, small fluctuations in variables with simi- 
lar discriminating power can lead to quite different tree 
structures, with possibly different discriminating ability. 
The boosting procedure allows an ensemble of such weak 
classifiers to be combined into a single, powerful clas- 
sifier. W e use the AdaBoost algor ithm of Freund and 
Schapire (Freund & Schapire 19971. In this procedure, 
we iteratively generate new decision trees by assigning 
a higher weight to objects that were previously misclas- 
sified. The misclassification rate R of a given tree is 
defined by 

R = 1 — max(p, 1 — p). 

The subsequent tree is trained by "boosting" the weight 
of each misclassified object by a factor 

1 - R 

and then rescaling the weights of all the objects to keep 
w i the same for all the trees. This process is repeated 
many times, resulting in a "forest" of trees. We find 
that forests of 50-100 trees give good results, with little 
improvements from larger forests. 

As a final step to prevent overtraining, the trees are 
pruned to remove sta tistically insignificant nodes. Define 
the cost complexity (Breiman et al. 19841 p for a given 
node to be 

i?(nodc) — i?(subtree below that node) 
^ #nodes (subtree below that node) — 1 

We iteratively remove the node with the smallest p value 
as long as p is less than some pruning-strength threshold; 
we obtain the best results with p = 4.5. Any duplicate 
trees that remain after the pruning step are removed. 

The final score of an object is a weighted sum of its 
score on each tree in the forest: 



y( x ) 



E 



In, 



/ii(x), 



where the subscore hi on an individual tree is +1 if the 
object is classified as signal and —1 if it is classified as 
background. This procedure gives higher weight to trees 
with lower misclassification rates. The more signal- like 
an object appears, the larger its score. 

The distribution of scores for signal and background 
objects can be converted into signal and background clas- 
sification probabilities, ys(B)- These distributions are 
shown in Figurc[2]for galaxies in the SDSS spectroscopic 
evaluation sample. Background galaxies — that is, those 
whose spectroscopic redshifts fall more than 3tr outside 
the signal region in question — have probabilities strongly 
peaked at low values. Thus, the BDT classifier proba- 
bility is a strong redshift discriminator. Finally, know- 
ing these classification probabilities, we can compute the 
probability that a galaxy with a given BDT score falls in 
the signal redshift bin: 



fs ■ VS,; 



fs ■ ys,i + (i - fs) ■ yB, 
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where the signal fraction fs = Ns/(Ns + Nb) is the 
expected fraction of galaxies in each redshift bin, and 
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Probability 

Fig. 2. — Top: Classification probability distributions for signal 
and background galaxies, normalized to equal areas. Bottom: Inte- 
grated classification probability distributions, showing the fraction 
of signal and background galaxies with probability greater than the 
given value. 



Ns, Nb are the expected numbers of signal and back- 
ground galaxies. The expected signal fraction in each 
redshift bin can be obtained directly from the redshift 
distribution of the training set, in cases where the red- 
shift distribution in the target evaluation set is similar. 
In fact, as seen in Figure [2] the signal and background 
classification probabilities are sufficiently well-separated 
that the signal probability Ps is relatively insensitive to 
the choice of signal fraction. On the other hand, any 
training-set-based method will have difficulty when the 
target evaluation set's properties — whether magnitudes, 
colors, or redshifts — differ substantially from those of the 
training set. 

3.2. Performance in SDSS Data 

The SDSS spectroscopic training sample of 510k galax- 
ies described in Section [2] is divided into 64 redshift bins 
containing equal numbers of galaxies. This ensures that 
each training subsample has equal statistics; as a result, 
the redshift bins themselves vary in width. The N galax- 
ies in bin i form the signal training set for the zth clas- 
sifier, while the background training set consists of 5N 
galaxies chosen at random from the set of galaxies with 
redshifts at least Az = 0.06 (approximately 3a z , p hoto) 
away from the redshift bin in question. We use the five 
observed magnitudes ugriz as our training variables. 
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The algorithm's performance is evaluated on a sub- 
sample of 200,000 galaxies that were excluded from the 
training process. The ensembles of classifier probabilities 
for each redshift bin, Pgj for some typical galaxies are 
shown in Figure [3] The mean of this histogram deter- 
mines the "best-estimate" photo- z for each galaxy, and 
the range containing the middle 68% of the area deter- 
mines the error. However, the full power of the algorithm 
comes from the reconstruction of the complete probabil- 
ity distribution itself. 



= 0.06 

= 0.06 ± 0.02 



= 0.11 

= 0.10 ±0.02 
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Fig. 4. — ArborZ photo-z estimate vs. spectroscopic redshift for 
the SDSS spectroscopic evaluation set. 
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Fig. 5. — ArborZ performance in the SDSS spectroscopic evalu- 
ation set, compared with the SDSS production photo-z algorithms 
Dl and CC2. The plot shows the 68% confidence-interval width 
of the photo-z residual distribution z v hoto ~ z sp ec as a function of 
spectroscopic redshift. The dashed ArborZ curve shows the effect 
of p lacing a cut on the peak classifier probability as described in 
|3.3| This cut rejects 11% of the galaxies. 
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Fig. 3. — Distribution of ArborZ boosted decision tree classifier 
probabilities as a function of redshift for some individual galaxies 
in the SDSS spectroscopic evaluation set. 



The photometric redshift obtained from the distribu- 
tion of BDT probabilities for the evaluation set is shown 
as a function of spectroscopic redshift in Figure [3] Fig- 
ure [5] shows the 68% confidence-interval width of the 



residual distribution z p h c 



-'spec 



as a function of spec- 



troscopic redshift. For comparison, we also show the 
performance of the t wo production SD SS photo-z algo- 
rithms, Dl and CC2 ( |Oyaizu et al.|2008[ ). These methods 
both employ neural networks, where the training vari- 
ables are the ugriz magnitudes and u — g, g — r, r — i, and 
i — z colors respectively. The ArborZ algorithm equals 
or exceeds the performance of these two algorithms over 
most of the redshift range in the sample. 

3.3. Error Estimation 

The ArborZ galaxy-by-galaxy probability distributions 
such as those shown in Figure [3] provide several dif- 
ferent methods to estimate the photo-z error. First, 



as noted above, we can simply determine the width of 
the region of the probability distribution that contains 
the middle 68% of the area, (768 ■ This is our default 
method. Figure [6] shows the normalized residual dis- 
tribution, (zphoto ~ z spec) /o~6&i m the SDSS evaluation 
sample. This distribution is well-described by a gaus- 
sian with a mean of —0.006 and a width of 0.985, indi- 
cating that the errors are properly estimated and that 
the photo- zs are unbiased. The fraction of catastrophic 
mismeasurements, which we define to be cases where 
Zphoto ~ z sp ec > 3cr Zphoto , is 1.9%, compared to 2.6% 
(3.7%) for the Dl (CC2) photo-z algorithm. However, 
the Dl and CC2 algorithms have normalized residual 
widths of 1.12 and 1.13 respectively, so these algorithms 
may be underestimating their errors by roughly 10%. 

An alternate approach to estimating the photo-z er- 
ror exploits the fact that the BDT probabilities pro- 
vide a quantitative figure-of-merit for the classification 
strength: the more signal-like an object appears to a 
given classifier, the higher its probability for that red- 
shift bin. We therefore expect galaxies with larger peak 
probabilities (P pea k) to be more reliably measured. The 
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Fig. 6. — Normalized residual distribution {z v y loto — z spec ) /cres 
for galaxies in the SDSS spectroscopic evaluation sample. The fit 
is to a Gaussian with a mean of —0.006 and a width of 0.985. 



distribution of peak probabilities in the SDSS spectro- 
scopic evaluation sample is shown in the top panel of 
Figure [7] In the bottom panel, we show the width of 
the residual distribution z v ] wto — z spec as a function of 
the peak probability, which displays the expected corre- 
lation. 

In applications where well-measured photo- zs are a 
prime concern and some reduction in statistics can be 
tolerated, one could place a cut on the peak probability 
to obtain a better- measured subsample of galaxies. For 
example, requiring that the peak probability be greater 
than 0.99 (0.90) retains 88% (99%) of the galaxies in 
the SDSS spectroscopic sample. Galaxies that pass the 
Ppeak > 0.99 cut have a mean photo-z error of 0.021. 
The mean photo-z error of galaxies rejected by this cut 
is 0.041, nearly twice as large. 





Fig. 7. — Top: Peak BDT probability distribution in the SDSS 
spectroscopic evaluation set. Bottom: Photo-z error as a function 
of peak probability. 



3.4. Reconstructed Redshift Distribution N(z) and the 
Photo-z PDF 

The methods described above provide a single best es- 
timate of each galaxy's photometric redshift, together 
with an estimated Gaussian error, as is the common 
practice for most photo-z algorith ms. Such esti mates, 
however, are generally biased (|Lima et al. 20081. The 



BDT apparatus, with its evaluation of the classification 
probability for each redshift, leads naturally to the recon- 
struction of each galaxy's full photo-z probability density 
function. For many applications, such as measurements 
of weak gravitational lcnsing or galaxy-galaxy correla- 
tions for baryon acoustic oscillation surveys, individual 
galaxy-by-galaxy photo- zs are less important than an ac- 
curate count of the number of galaxies in each redshift 
bin, N(z). For this purpose, the PDFs are more useful 
and less biased than the best-estimate photo-zs. 

When normalized to unit area and corrected for the 
variable bin widths, the probability distributions illus- 
trated in Figure [3] become PDFs. Figure [8] shows the 
result of summing these PDFs for galaxies in the SDSS 
spectroscopic evaluation sample, together with the re- 
sults from using the ArborZ peak photo-z estimate, and 
the two SDSS production photo-z algorithms. As a 
quantitative comparison, we compute the goodness-of- 
reconstruction parameter 



X 



N, 



photo. i 



spec, i 



where i labels the redshift bins. We find that \ 2 — 
5.88,4.04,2.35, and 1.99 for the ArborZ (peak) method, 
CC2, Dl, and ArborZ PDF method respectively. Thus, 
the summed PDFs provide the most faithfully recon- 
structed N(z) of the algorithms considered. 

3.5. Performance in SDSS Mock Catalog 

Before characterizing the performance of the ArborZ 
algorithm in future surveys, we wish to establish the va- 
lidity of our mock catalogs, described in Appendix A, 
by constructing a mock catalog similar to the SDSS 
spectroscopic sample. Using a larger mock with col- 
ors drawn from the full SDSS data sample, we simulate 
the SDSS spectroscopic selection to create a mock cat- 
alog containing both a low-redshift flux-limited compo- 
nent and a higher-redshift volume-limited population of 
LRGs. Color-color comparisons of the real SDSS spec- 
troscopic sample and the mock catalog, such as the one 
shown in Figure[9] show good qualitative agreement. Dif- 
ferences in the outlier populations are likely due to sim- 
plified treatment of SDSS photometric errors in the mock 
catalog. 

We train the ArborZ algorithm on the mock catalog us- 
ing a training set i den tical in size to that employed in the 
real data. Figure [To] shows the photo-z residual distri- 
bution for the SDSS spectroscopic mock, compared with 
the same distribution in real data, where the training in 
the data was also performed on the observed magnitudes 
only. The good agreement between these two distribu- 
tions gives us confidence in extending these photo-z error 
comparisons to mock catalogs for deeper surveys. 

3.6. Performance in DES Mock Catalog 
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Fig. 8. — Top: Reconstructed redshift distribution in the SDSS 
spectroscopic evaluation sample for four algorithms: ArborZ us- 
ing the single best-estimate photo-z, the summed ArborZ PDFs, 
and the two SDSS production algorithms Dl and CC2. Bottom: 
Fractional error distribution (N spec — Nphoto) / ^apec- 




Fig. 9. — Color-color diagrams showing g — r vs. r — i for the 
SDSS spectroscopic sample (left) and the SDSS spectroscopic mock 
catalog (right). 



Similarly, we have applied the ArborZ algorithm to 
the mock catalog of the Dark Energy Survey. We train 
and evaluate the algorithm using samples of 500k and 
200k galaxies respectively, randomly selected from the 
full 20 million galaxy sample. We train on the observed 
grizY magnitudes^ The resulting ArborZ photo-z are 
shown in Figure |11| For comp arison, we have trained 
the neural net alg orithm ANNz ( Collister fe; Lahav|2004 



Firth et al. 2003 1 on the same training set. The neura 
net consists of a committee of five networks with five in- 
puts (the observed magnitudes), two hidden layers with 
ten nodes each, and one output layer. The comparative 



performance of the two algorithms is illustrated in Fig 
uresfHandfini 
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Fig. 11.— 

z photo vs - z f° r galaxies in the DES mock catalog. 
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Fig. 12. — Photo-z error vs. z for the DES mock catalog for 
the ArborZ algorithm (solid) and the neural net algorithm ANNz 
(dots). Also shown is the distribution for the 74% of the galaxies 
in the catalog that pass a P pea k > 0.99 cut. 



The redshift distributions N(z) in Figure 13 display 



unphysical peaked structures when reconstructed by 
both the neural net and by ArborZ, when the peak photo- 
z is used. This could indicate bias in the training, or pos- 
sibly a problem with the color distribution in the mock 
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catalog. However, the peaks largely disappear when the 
ArborZ PDFs are stacked to reconstruct the redshift dis- 
tribution. The goodness-of-reconstruction parameter x 2 
defined above is 7.10 for ANNz, 5.59 for the ArborZ 
(peak) method, and 0.45 using the ArborZ PDFs. The 
much better agreement obtained from using the PDFs 
highlights the limitations of using a single best-estimate 
photo-z to characterize a galaxy, and shows the benefits 
of knowing the full PDF. 
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Fig. 13. — Top: Reconstructed redshift distributions in the DES 
mock catalog, compared with the true distribution, for the ArborZ 
(peak photoz), ANNz, and ArborZ (summed PDF) algorithms. 
Bottom: Fractional error distribution (Nt rue — ^ P hoto)/^true- 



4. CONCLUSIONS 

We have presented a new technique to estimate pho- 
tometric redshifts for galaxies. The technique, which 
we call ArborZ, uses Boosted Decision Tree classifiers 
trained on galaxies with spectroscopically-determincd 
redshifts. In addition to providing a single best-estimate 
photo-z with a reliably-calculated error, the method nat- 
urally produces a complete Probability Density Function 
for each galaxy's photo-z. The PDFs are shown to yield 
a more accurate reconstruction of the redshift distribu- 
tion N(z) than algorithms that rely on a single photo-z 
for each galaxy. We also find that the peak probability 



for each galaxy provides a quantitative measure of the 
photo-z quality, and can be used to define subsamples 
with better-measured photo-zs and fewer outliers. The 
performance of the algorithm on SDSS data with known 
spectroscopic redshifts in the range < z < 0.5 is com- 
parable to or better than that of the production DR6 
SDSS photo-zs, with slightly smaller errors and fewer 
catastrophic failures. We then studied the performance 
of the ArborZ algorithm on a simulated sample spanning 
a much larger redshift range (0 < z < 1.4), intended 
to model the five-year sensitivity of the upcoming Dark 
Energy Survey. When trained on identical training sets, 
the ArborZ algorithm outperforms the artificial neural 
net algorithm ANNz, making it a promising candidate 
for determining photo-zs in deep photometric surveys. 

As an empirical, learning-based algorithm, our ap- 
proach does not provide a ready path to reliable photo-zs 
for objects significantly different from those in the train- 
ing set. Future work will center upon exploring the biases 
inherent in different training sets, and on understanding 
the benefits and limitations of using simulated data to 
fill in gaps in these training sets. 
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APPENDIX 

A. MOCK CATALOGS AND THE ADDGALS ALGORITHM 



The Adding Density-Determined GAlaxies to Lightcone Simulations algorithm, (ADDGALS; Wechsler et al.|2 009), 
is a method for producing mock galaxy lightcone surveys that accurately reproduce the spatial and color properties of 
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a galaxy population. It operates in conjunction with a large volume, low mass-resolution iV-body simulation, adding 
galaxies with properties based on dark matter and galaxy overdensities. The first step of the algorithm includes one 
observational input, the galaxy luminosity fu nction in a given band. Here w e use the most recently published r-band 
luminosity functio n measured from t he SDSS (iMontero-Dorta & Prada 2008 1, and assume passive evolution of 1.3 mags 
per unit redshift ( |Faber et al.| [2007). A list of galaxies satisfying this luminosity function is generated and assigned 
to dark matter particles m the simulation. This is done using a luminosity dependent function, P(8\L r / L*), which 
specifies the distribution of densities chosen as a function of galaxy luminosity. Here we use the dark matter density 
smoothed at the Lagrangian scale corresponding to a mass near M*, M smoot h = 1.8 x 10 13 /i -1 M . The form of this 
PDF has been determined by studying subhalos and semi-analytic galaxies in higher resolution (but smaller volume) 
simulations. We find that the radius enclosing M smoot h is well represented by a lognormal plus a Gaussian over a range 
of masses and luminosities. In order to determine the parameters of this PDF, specified by five parameters which are 
each a function of L r /L Sf , we use a second obser vational input as a constraint, the measured luminosity-dependent 
two-point correlation function ( Zehavi et al.|2005 1. The L r /L* term in the PDF allows us to account for passive galaxy 
evolution but ignores other evolutionary effects such as ongoing star formation. This algorithm is applied for galaxies 
brighter than 0.4L*. The best-fit model parameters are chosen using an MCMC analysis. This results in a distribution 
of galaxies with r-band magnitudes whose luminosity function and clustering closely matches observations. The use 
of the smoothed background density instead of resolved halos allows the method to populate large volume simulations 
with lower resolution than would otherwise be possible, but sacrifices some fidelity in high density regions. 

One type of high density region which is not well-reproduced with this approach is the centers of clusters, which host 
brightest cluster galaxies (BCGs). It has been shown that BCG luminosities are tightly correlated with the masses 
of their host halos, and also that they are brigh ter than would be in dicated if they were selected from the Schechter 
function of satellite galaxies in the same cluster ( Hansen et al.| [2009). In order to account for these trends, we modify 
the algorithm so that, before any other galaxies are inserted, a BCG luminosity is calculated for each resolved halo 
of the simulation , based on the obse r vationally-constrained mean and scatter o f the luminosity-mass relationship for 



central galaxies (Hansen et al. 2009 Vale & Ostriker 2004 Zheng et al. 20071. These objects are removed from our 



initial list of galaxies and placed at the center of halos (in this catalog version, this is done for halos more massive 
than - 5 x 10 13 H^Mq). 

Once the simulation has been populated with galaxies with r— band luminosities assigned, we assign colors to each 
galaxy in order to mimic photometric surveys. Our method assumes that galaxy colors are set by both luminosity and 
environment. We first compile a galaxy training set from which we can measure the distribution of colors as a function 
of luminosity and environment. Here, we take the magnitude-limited spectroscop i c SPS S DR6 VAGC galaxy catalog 
and use the density measurements of Cooper, Tremonti, Newman, & ZabludofF ( |2008[ ). The local galaxy density is 
determined by calculating the projected distance to the fifth nearest galaxy in a Az bin with velocity dispersion width 
1000 km s _1 . For each galaxy in our mock galaxy catalog, we calculate the same density measurement, identify a SDSS 
galaxy with similar density and r— band magnitude, and fc-correct the colors of this SDSS galaxy to the appropriate 
redshift for our mock galaxy. When choosing densities, we consider relative densities of galaxies in each redshift bin, 
which mitigates differences between the minimum magnitude used in calculating the densities between the volume- 
limited mock and the magnitude-limited dataset. We restrict our SDSS sample to galaxies closer than z < 0.2 so that 
the bias of observed galaxies remains relatively constant over this region. 

When modeling deep surveys, at low redshifts we must add galaxies dimmer than this algorithm can easily produce; 
the number density of galaxies approaches (and, for the lowest redshifts, exceeds) the number of dark matter particles 
in the simulation. In this version of the catalog, these galaxies (those dimmer than 0.4L*(z)) are placed randomly 
in the volume, and should not be expected to have clustering properties that match observed galaxies in detail. In 
addition, the limited depth of current surveys prevents us from compiling a training set of colors for these galaxies. 
We use the dimmest, bluest galaxies in our training set, which extend down to absolute r— band magnitude s» -15, and 
assume that these colors (which arc likely somewhat too red) are appropriate for the dimmest galaxies at low redshift. 

This algorithm is very successful in reproducing photometric properties for galaxies in the SDSS. Here, we extend 
the algorithm to substantially higher redshift and deeper depths to model DES. There are a number of issues in galaxy 
evolution that have not yet been addressed in creating this catalog, and we highlight them here. At high redshift 
we simply extrapolate color information from low redshift galaxies because of the lack of an appropriate training set. 
In the present version of the catalog, there is no stellar evolution modeling. While galaxy r— band magnitudes are 
passively evolved and spectra fc-corrected, we assume both that the typical rest-frame colors of galaxies are unchanged 
and that the color-density-luminosity relation remains unchanged. Both of these assumptions are certainly incorrect 
in detail. Given these limitations, the detailed distribution of photometric errors that come out of the algorithm at 
high redshift should be treated with caution. Future catalog versions will address these issues. 
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